Simulating Data to Study Performance of Finite Mixture Modeling and Clustering Algorithms
نویسندگان
چکیده
A new method is proposed to generate sample Gaussian mixture distributions according to pre-specified overlap characteristics. Such methodology is useful in the context of evaluating performance of clustering algorithms. Our suggested approach involves derivation of and calculation of the exact overlap between every cluster pair, measured in terms of their total probability of misclassification, and then guided simulation of Gaussian components satisfying pre-specified overlap characteristics. The algorithm is illustrated in two and five dimensions using contour plots and parallel distribution plots, respectively, which we introduce and develop to display mixture distributions in higher dimensions. We also study properties of the algorithm and variability in the simulated mixtures. The utility of the suggested algorithm is demonstrated via a study of initialization strategies in Gaussian clustering.
منابع مشابه
MixSim: An R Package for Simulating Data to Study Performance of Clustering Algorithms
The R package MixSim is a new tool that allows simulating mixtures of Gaussian distributions with different levels of overlap between mixture components. Pairwise overlap, defined as a sum of two misclassification probabilities, measures the degree of interaction between components and can be readily employed to control the clustering complexity of datasets simulated from mixtures. These datase...
متن کاملAssessment of the Performance of Clustering Algorithms in the Extraction of Similar Trajectories
In recent years, the tremendous and increasing growth of spatial trajectory data and the necessity of processing and extraction of useful information and meaningful patterns have led to the fact that many researchers have been attracted to the field of spatio-temporal trajectory clustering. The process and analysis of these trajectories have resulted in the extraction of useful information whic...
متن کاملUse of the Improved Frog-Leaping Algorithm in Data Clustering
Clustering is one of the known techniques in the field of data mining where data with similar properties is within the set of categories. K-means algorithm is one the simplest clustering algorithms which have disadvantages sensitive to initial values of the clusters and converging to the local optimum. In recent years, several algorithms are provided based on evolutionary algorithms for cluster...
متن کاملMulti-layer Clustering Topology Design in Densely Deployed Wireless Sensor Network using Evolutionary Algorithms
Due to the resource constraint and dynamic parameters, reducing energy consumption became the most important issues of wireless sensor networks topology design. All proposed hierarchy methods cluster a WSN in different cluster layers in one step of evolutionary algorithm usage with complicated parameters which may lead to reducing efficiency and performance. In fact, in WSNs topology, increasin...
متن کاملAn improved opposition-based Crow Search Algorithm for Data Clustering
Data clustering is an ideal way of working with a huge amount of data and looking for a structure in the dataset. In other words, clustering is the classification of the same data; the similarity among the data in a cluster is maximum and the similarity among the data in the different clusters is minimal. The innovation of this paper is a clustering method based on the Crow Search Algorithm (CS...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2009